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Abstract 

>, 

, Solomonoff 's central result on induction is that the posterior of a universal 

|0 ' semimeasure M converges rapidly and with probability 1 to the true sequence 

■ generating posterior fi, if the latter is computable. Hence, M is eligible as 

CP , a universal sequence predictor in case of unknown fi. Despite some nearby 

results and proofs in the literature, the stronger result of convergence for 
^ all (Martin-L6f) random sequences remained open. Such a convergence result 

Q . would be particularly interesting and natural, since randomness can be defined 

in terms of M itself. We show that there are universal semimeasures M which 



do not converge for all random sequences, i.e. we give a partial negative answer 
to the open problem. We also provide a positive answer for some non-universal 
semimeasures. We define the incomputable measure D as a mixture over all 
computable measures and the enumerable semimeasure W as a mixture over 
all enumerable nearly-measures. We show that W converges to D and D to ^jl 
on all random sequences. The Hellinger distance measuring closeness of two 
distributions plays a central role. 
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1 Introduction 

A sequence prediction task is defined as to predict the next symbol x„ from an ob- 
served sequence x = xi...Xn-i- Tlie key concept to attack general prediction problems 
is Occam's razor, and to a less extent Epicurus' principle of multiple explanations. 
The former/latter may be interpreted as to keep the simplest/all theories consistent 
with the observations xi...Xn-i and to use these theories to predict x„. Solomonoff 
|Sol64| IHol78j formalized and combined both principles in his universal prior M which 
assigns high/low probability to simple/complex environments x, hence implement- 
ing Occam and Epicurus. Formally it is a mixture of all enumerable semimeasures. 
An abstract characterization of M by Levin |ZL70j is that M is a universal enu- 
merable semimeasure in the sense that it multiplicatively dominates all enumerable 
semimeasures. 

Solomonoff's |Sol78j central result is that if the probability /i(a;„|xi...x„_i) of 
observing Xn at time n, given past observations computable function, 

then the universal posterior M„ := M{xn\xi...Xn-i) converges (rapidly!) with fi- 
probability 1 (w.p.l) for n^oo to the true posterior /i„ :=/i(x„|xi...x„_i), hence M 
represents a universal predictor in case of unknown "true" distribution /i. Conver- 
gence of M„ to /i„ w.p.l tells us that M„ is close to //„ for sufficiently large n for 
almost all sequences 0:1X2.... It says nothing about whether convergence is true for 
any particular sequence (of measure 0). 

Martin-Lof (M.L.) randomness is the standard notion for randomness of individ- 
ual sequences |ML66tlLV97j . A M.L. -random sequence passes a// thinkable effective 
randomness tests, e.g. the law of large numbers, the law of the iterated logarithm, 
etc. In particular, the set of all /i-random sequences has //-measure 1. It is natu- 
ral to ask whether M„ converges to (in difference or ratio) individually for all 
M.L. -random sequences. Clearly, Solomonoff's result shows that convergence may 
at most fail for a set of sequences with //-measure zero. A convergence result for 
M.L. -random sequences would be particularly interesting and natural in this con- 
text, since M.L. -randomness can be defined in terms of M itself |Lev73j . Despite 
several attempts to solve this problem |Vov87| IVLOOl IHut03bj . it remained open 
Hutn.Scj . 

In this paper we construct an M.L. -random sequence and show the existence of 
a universal semimeasure which does not converge on this sequence, hence answer- 
ing the open question negatively for some M. It remains open whether there exist 
(other) universal semimeasures, probably with particularly interesting additional 
structure and properties, for which M.L.-convergence holds. The main positive con- 
tribution of this work is the construction of a non-universal enumerable semimeasure 
W which M.L. -converges to /i as desired. As an intermediate step we consider the 
incomputable measure D, defined as a mixture over all computable measures. We 
show posterior M.L.-convergence oiWioD and of D to /i. The Hellinger distance 
measuring closeness of two posterior distributions plays a central role in this work. 

The paper is organized as follows: In Section |21 we give basic notation and 
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results (for strings, numbers, sets, functions, asymptotics, comput ability concepts, 
prefix Kolmogorov complexity), and define and discuss the concepts of (universal) 
(enumerable) (semi)measures. Section 121 summarizes Solomonoff's and Gacs' results 
on posterior convergence of M to /i with probability 1. Both results can be derived 
from a bound on the expected Hellinger sum. We present an improved bound on the 
expected exponentiated Hellinger sum, which implies very strong assertions on the 
convergence rate. In Section|3]we investigate whether convergence for all Martin-Lof 
random sequences hold. We construct a universal semimeasure M and an /x-M.L.- 
random sequence on which M does not converge to for some computable /i. In 
Section El we present our main positive result. We derive a finite bound on the 
Hellinger sum between and D, which is exponential in the randomness deficiency 
of the sequence and double exponential in the complexity of /x. This implies that the 
posterior of D M.L. -converges to /i. Finally, in Section (HI we show that W is non- 
universal and asymptotically M.L. -converges to D. Section [71 contains discussion 
and outlook. 



2 Notation & Universal Semimeasures M 

Strings. Let i,k,n,t G W = {1,2,3,...} be natural numbers, x,y,z ^ X* = U^o'^" 
be finite strings of symbols over finite alphabet X 3 a,b. We denote strings x of 
length i{x) =n hj X = xiX2---Xn € A"" with xt&X and further abbreviate Xk-.n '■= 
XkXk+i---Xn-iXn for k<n, and x<„:=a;i...x„_i, and e = x<i=a:„+i:„GA'° = {e} for the 
empty string. Let uj = xi.oo&X^ be a generic and a&X°° a specific infinite sequence. 
For a given sequence xi:oo we say that Xt is on-sequence and Xtj^Xt is off-sequence. 
x[ may be on- or off-sequence. We identify strings with natural numbers (including 
zero, X* = 1NU{0}). 

Sets and functions. Q, M, iR+:=[0,oo) are the sets of fractional, real, and non- 
negative real numbers, respectively. #5 denotes the number of elements in set S, 
ln() the natural and log() the binary logarithm. 

Asymptotics. We abbreviate lim„^oo[/(^)^5'('^)] = by f{n)"-^ g{n) and say / 
converges to g, without implying that lim„^oofi'(^) itself exists. We write f{x) < g{x) 
for f{x) =0{g{x)) and f{x) ^g{x) for f{x)<g{x) + 0{l). 

Computability. A function f:S-^lRU{oo} is said to be enumerable (or lower semi- 
computable) if the set {{x,y) : y < f{x),x &S,y is recursively enumerable. / is 
co-enumerable (or upper semi-computable) if [— /] is enumerable. / is computable 
(or estimable or recursive) if / and [— /] are enumerable. / is approximable (or limit- 
computable) if there is a computable function g:SxlN-^lR with \imn^oog{x,n) = 
f{x). The set of enumerable functions is recursively enumerable. 
Complexity. The conditional prefix (Kolmogorov) complexity K{x\y): = min{i{p): 
U {y,p) =x halts} is the length of the shortest binary programpG {0,1}* on a universal 
prefix Turing machine U with output x&X* and input y&X* LV97J. K{x): = K{x\e). 
For non-string objects o we define K{o) :=K{{o)), where (o) G A"* is some standard 



4 



Marcus Hutter & Andrej Muchnik, IDSIA-14-04 



code for o. In particular, if is an enumeration of all enumerable functions, 

we define K{fi) = K{i). We only need the following elementary properties: The 
co-enumerability of K, the upper bounds K{x\£{x)) < i{x)log\X\ and K{n) < 21ogra, 
and K{x\y) ^ K{x), subadditivity K{x) ^ K{x,y) ^ K{y)+K{x\y), and information 

non- increase K(f(x)) ^ K(x) + K(f) for recursive f : X* ^ X* . 

We need the concepts of (universal) (semi) measures for strings |ZL70j . 

Definition 1 ((Semi) measures) We call u : X* [0,1] a semimeasure if i'{x)> 
Hai^x^i^o)^^ ^ ! (probability) measure if equality holds and i/(e) = l. z/(x) 

denotes the u -probability that a sequence starts with string x. Further, z/(a|x) : = ^^^^ 
is the posterior u -probability that the next symbol is aEX, given sequence xeX*. 

Definition 2 (Universal semimeasures M) A semimeasure M is called a uni- 
versal element of a class of semimeasures M., if 

M e M andyi^ e M > : M{x) > w^-i^{x) Vx G X* . 

From now on we consider the (in a sense) largest class Ai which is relevant from 
a constructive point of view (but see |Schn21 IHutn.Sbj for even larger constructive 
classes), namely the class of a// semimeasures, which can be enumerated (=effectively 
be approximated) from below: 

Ai := class of all enumerable semimeasures. (1) 

Solomonoff |Sol641 Eq.(7)] defined the universal posterior M{x\y) = M{xy)/M{y) 
with M{x) defined as the probability that the output of a universal monotone Tur- 
ing machine starts with x when provided with fair coin flips on the input tape. Levin 
jZL70j has shown that this M is a universal enumerable semimeasure. Another pos- 
sible definition of M is as a (Bayes) mixture iSol64, ZL70, Sol78. iWfl IHutnSbj : 
M (x) =J2ueM'^^ ^^'^^ ^i^) ^ where K^u) is the length of the shortest program comput- 
ing function u. Levin |ZL70j has shown that the class of all enumerable semimeasures 
is enumerable (with repetitions), hence M is enumerable, since K is co-enumerable. 
Hence M gA^, which implies 

M{x) > w^M{x) > Wm2~^^''^u{x) = wliy{x), where wl = 2-^^''\ (2) 

Up to a multiplicative constant, M assigns higher probability to all x than any other 
enumerable semimeasure. All M have the same very slowly decreasing (in u) domi- 
nation constants w'^, essentially because Mg A^. We drop the prime from in the 
following. The mixture definition M immediately generalizes to arbitrary weighted 
sums of (semi)measures over other countable classes than A4, but the class may 
not contain the mixture, and the domination constants may be rapidly decreasing. 
We will exploit this for the construction of the non-universal semimeasure W in 
Sections E] and IHl 
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3 Posterior Convergence with Probability 1 

The following convergence results for M are well-known |Sol78t lLV97t IHut03aj . 

Theorem 3 (Convergence of M to /i w.p.l) For any universal semimeasure 
M and any computable measure fi it holds: 



M«|x<„) iJ,{x'Jx<n) for any x'^ and ^[^"l!^^") 1, both w.p.l for n 



oo. 



The first convergence in difference is Solomonoff 's |Sol78j celebrated conver- 



gence result. The second convergence in ratio has first been derived by Gacs 
|LV97j . Note the subtle difference between the two convergence results. For any se- 
quence x[.^ (possibly constant and not necessarily random), M(x'„|x<n)— /i(x'^|a;<n) 
converges to zero w.p.l (referring to xi:oo), but no statement is possible for 
M(a;^|x<„)/;u(x^|x<„), since liminf/i(x'„|x<„) could be zero. On the other hand, 
if we stay on-sequence (x'j^.^^ = xi:oo), we have M(x„|a;<n)//i(x„|x<„) 1 (whether 
inf/i(x 

n|3^<n) tends to zero or not does not matter). Indeed, it is easy to give an 
example where M(a;'„|a;<„)/yu(x'„|x<„) diverges. For /i(l|a;<n) = 1 — /i(0| 
we get /i(Oi:„) = nr=i(l^i^~^) '^""^ c = 0.450... > 0, i.e. Oi:oo is /i-random. On the 
other hand, one can show that M(0<„)=O(l) and M(0<„1) = 2-^("), which implies 
^ . 2-K{n) ^ n ^ oo for n -> oo (ir (n) ^ 21ogn) . 
Theorem El follows from (the discussion after) Lemma ID due to M{x)>w^^{x). 
Actually the Lemma strengthens and generalizes Theorem El In the following we 
denote expectations w.r.t. measure p by Ep, i.e. for a function f : X"^ ^ M, Ep[/] = 
J2'xj^.^p{xi;n)f{xi:n), whcrc J^' sums over all Xi;n for which p{xi:n) 7^0. Using J2' 
instead J2 is important for partial functions / undefined on a set of p-measure zero. 
Similarly Pp denotes the p-probability. 

Lemma 4 (Expected Bounds on Hellinger Sum) Let p be a measure and v be 
a semimeasure with z/(a;) >w-p{x) Vx. Then the following bounds on the Hellinger 
distance ht{iy,p\uj<t) ■ = T,aex{\/ ^{(^\^<t) - \/ ^alu^t) Y hold: 

oo 

EE 



/j(a;t|<^<t) 



(i) °° (ii) °° (Hi) 

< J2^[ht] < 21n{E[exp(i^M]} < \nw' 
t=i t=i 



t=i 

where E means expectation w.r.t. p. 

The lnM;~^-bounds on the first and second expression have first been derived 
in |Hut03aj . the second being a variation of Solomonoff's bound Z]nE[(i^(0|x<„) — 
p{0\x^n)Y] < |lnw7~^. If sequence XiX2... is sampled from the probability measure 
p, these bounds imply 

u{x'Jx<n) p{x'Jx<n) for any x'^ and 1, both w.p.l for n ^ oo. 
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where w.p.l stands here and in the following for 'with /^-probability 1'. 

Convergence is "fast" in the following sense: The second bound (X]tE[/it] <lnw~^) 
implies that the expected number of times t in which ht>e is finite and bounded by 
-lnw~^. The new third bound represents a significant improvement. It implies by 
means of a Markov inequality that the probability of even only marginally exceeding 
this number is extremely small, and that J^t^t is very unlikely to exceed Inw^^ by 
much. More precisely: 

P[#{t ■■ ht>6}> i(lnw-i + c)] < P[Et ht > Inw-i + c] 

= P[exp(iEt/ii) > < 0^yE[exp(iE«/ii)]e-"/^ < e-'/\ 

Proof. We use the abbreviations pt = p{xt\x^t) and pim = Pi- ■■■■ Pn = p{xi;n) for 
pe{p,iy,R,N,...} and ht = Y.xti.^J^t- y/JhY ■ 
{i) follows from 

E[(yi-i)'k<*] = E Mv/f-i)'= E (v^-v^)' < h, 

by taking the expectation E[] and sum 

{ii) follows from Jensen's inequality exp(E[/]) <E[exp(/)] for f = \j2tht- 

{Hi) We exploit a construction used in |Vov87t Thm.l]. For discrete 

(semi) measures p and q with Y.iVi = 1 and J^iQi < 1 it holds: 

EV& < l-iE(v^-v^)' < exp[-|^(v/p--y^)2]. (3) 

i i i 

The first inequality is obvious after multiplying out the second expression. The 
second inequality follows from 1 — a;<e~^. Vovk jVov87j defined a measure Rt- = 
^JptVt/Nt with normalization Nt := J^xty/f^t^t- Applying (jH)) for measure p and 
semimeasure z/ we get A^t <exp(— Together with h'{x)>w-p{x) Wx this implies 



[[Rt = [[^^^r~ = ~ = > Pl:nVW exp{^ kt) . 

t=l t=l JVi:„ y pi-n 

Summing over xi:n and exploiting J^xtR-t = 1 we get 1 > y/wE[exp{^J2tht)], which 
proves (iii). 

The bound and proof may be generalized to l>w'^Ei[exTp{^J2tJ2xt{^t~l-''ty^'^)] 
with < K < I by defining Rt = p\~'^i't/Nt with A^^ = J2xtf^t~'^K and exploiting 

One can show that the constant | in Lemma^Jcan essentially not been improved. 
Increasing it to a constant a > 1 makes the expression infinite for some (Bernoulli) 
distribution p (however we choose u). For v = M the expression can become already 
infinite for a > ^ and some computable measure p. 
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4 Non- Convergence in Martin-Lof Sense 

Convergence of M(x„|x<„) to yu(a;„|x<„) with //-probability 1 tells us that M(a;„|a;<„) 
is close to fi{xn\x^n) for sufficiently large n on "most" sequences Xi-^o- It says nothing 
whether convergence is true for any particular sequence (of measure 0). Martin-Lof 
randomness can be used to capture convergence properties for individual sequences. 
Martin-Lof randomness is a very important concept of randomness of individual 
sequences, which is closely related to Kolmogorov complexity and Solomonoff 's uni- 
versal semimeasure M. Levin gave a characterization equivalent to Martin-Lof's 
original definition |Lev73j : 

Definition 5 (Martin-Lof random sequences) A sequence uj = a;i:oo is fi- 
Martin-Lof random (fi.M.L.) iff there is a constant c < oo such that M{LJi-n) < 
c-/i(co'i:„) for alln. Moreover, c?^(c<j) : = sup„{log^^^^^i^} <logc is called the random- 
ness deficiency ofu. 

One can show that an M.L.-random sequence Xi^^o passes all thinkable effective 
randomness tests, e.g. the law of large numbers, the law of the iterated logarithm, 
etc. In particular, the set of all /i. M.L.-random sequences has /x- measure 1. 

The open question we study in this section is whether M converges to /i (in 
difference or ratio) individually for all Martin-Lof random sequences. Clearly, The- 
orem El implies that convergence /z.M.L. may at most fail for a set of sequences with 
/i-measure zero. A convergence M.L. result would be particularly interesting and 
natural for M, since M.L. -randomness can be defined in terms of M itself (Definition 

The state of the art regarding this problem may be summarized as follows: 
Vov87j contains a (non-improvable?) result which is slightly too weak to imply 
M.L. -convergence, |LV97| Thm.5.2.2] and |VL00t Thm.lO] contain an erroneous 
proof for M.L. -convergence, and |Hutn8bj proves a theorem indicating that the an- 
swer may be hard and subtle (see |Hut03bj for details). 

The main contribution of this section is a partial answer to this question. We 
show that M.L.-convergence fails at least for some universal semimeasures: 

Theorem 6 (Universal semimeasure non-convergence) There exists a uni- 
versal semimeasure M and a computable measure /i and a fi.M.L. -random sequence 
a, such that M(a„|a<„) -/-^ yu(a„|a<„) for n ^ oo. 

This implies that also Mn/ f^n does not converge (since /Xn < 1 is bounded). We do 
not know whether Theorem El holds for all universal semimeasures. The proof idea 
is to construct an enumerable (semi)measure u such that u dominates M on some 
/i-random sequence a, but z/(a„|a<„) 74/i(a„|a<„). Then we mix M to z/ to make u 
universal, but with larger contribution from u, in order to preserve non- convergence. 
There is also non-constructive proof showing that an arbitrary small contamination 
with u can lead to non-convergence. We only present the constructive proof. 
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Proof. We consider binary alphabet A' = {0,1} only. Let = A(x) := 2"^^^^ be 
the uniform measure. We define the sequence a as the (in a sense) lexicographically 
first (or equivalently left-most in the tree of sequences) A. M.L. -random sequence. 
Formally we define a, inductively in ri = 1,2,3,... by 

a„ = if M(a<„0) < 2"", and = 1 else. (4) 

We know that M(e) < 1 and M(a<„0) < 2^" if = 0. Inductively, assuming 
M(a<„) < 2-"+i for a„ = 1 we have 2""+^ > M(a<„) > M(a<„0) +M(a<„l) > 
2~"' + M (a<„l) since M is a semimeasure, hence M(a<„l) <2~"'. Hence 

M(ai:„) < 2^" = A(q;i:„) Vn, i.e. a is A.M.L.-random. (5) 

Let M* with t= 1,2,3,... be computable approximations of M, which enumerate M, 
i.e. M*(x) y M{x) for t^oo. W define a* like a but with M replaced by M* in 
the definition. M*/" M implies aV (lexicographically increasing). We define an 
enumerable semimeasure u as follows: 

{2~* if i{x) = t and x < a\.j. 

if i(x) = t and a; > a\.. 

if i{x) > t 

z/*(xO) + z/*(xl) if i{x) < t 

where < is the lexicographical ordering on sequences, is a semimeasure, and with 
a* also z/* is computable and monotone increasing in t, hence u := limt^oo'^* is an 
enumerable semimeasure (indeed, is a measure). We could have defined a utn by 
replacing a^.^ with a"^ in (jU)). Since z/f„ is monotone increasing in t and n, any order 
of t,n — >■ oo leads to u, so we have chosen arbitrarily t = n. By induction (starting 
from i{x) = t) it follows that 



z/*(a;) = 2 ^'^''^ if x < and i{x) < t, v\x) = if z > a* 



On-sequence, i.e. for x = z/* is somewhere in-between and 2~^^^\ Since 
sequence a := limfa* is A.M.L.-random it contains 01 infinitely often, actually 
= 01 for a non-vanishing fraction of n. In the following we fix such an 
n. For t>n we get 

I^*("<n) = Z^*(a<nO) + Z/*(a<nl) = Z^*(a<nO) = Z/* («!:„) =^ I^(a<n) = //(«!:„) 

>ai;„>a5^,^, since a„=0 

This ensures i^(a„|a<„) = l7^| = A„. For t>n large enough such that a\.^_^_i = ai:n+i 
we get: 

<"l:n + l' ^^^^^ an + l=l 
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This ensures z^(ai:n) >2^"~^ > jM{ai:n) by Let M be any universal semimeasure 
and 0<7<|. Then M'(x) := (1— 7)z/(x)+7M(x) Vx is also a universal semimeasure 
with 

M(a<„) < 2-"+i and M(ai:„) > 

..,f I X (l-7)z/(ai:n) +7M(ai:„) (l-7)z/(ai:„) 
M(a„|a<„) = r— — — — > 



(l-7)z/(a<„) +7M(a<„) " (l-7)z/(a<„) + 72""+! 
1-7 ^ 1-7 ^ 1 



l-7 + 72-"+Vz/K„) - 1 + 37 2" 
T T 

For instance for 7 = | we have M'(a„|a<„) > | 7^ | = A(Q;„|a<„) for a non-vanishing 
fraction of n's. □ 

A converse of Theorem IHl can also be shown: 

Theorem 7 (Convergence on non-random sequences) For every universal 
semimeasure M there exist computable measures fi and non-fi.M.L. -random se- 
quences a for which M(a„|a;<„)//i(Q;„|Q;<„)— >1. 



5 Convergence in Martin-L6f Sense 

In this and the next section we give a positive answer to the question of posterior 
M.L. -convergence to fi. We consider general finite alphabet X. 

Theorem 8 (Universal predictor for M.L. -random sequences) There exists 
an enumerable semimeasure W such that for every computable measure fi and every 
Ij,. M.L. -random sequence uj, the posteriors converge to each other: 

TU(a|ct;<t) ii{a\uj^t) for all a E X if d^{uj) < 00. 

The semimeasure W we will construct is not universal in the sense of dominating 
all enumerable semimeasures, unlike M. Normalizing W shows that there is also a 
measure whose posterior converges to /x, but this measure is not enumerable, only 
approximable. For proving Theorem |H1 we first define an intermediate measure D as 
a mixture over all computable measures, which is not even approximable. Based on 
Lemmas I4|9I101 Proposition ITT] shows that D M.L.-converges to /z. We then define 
the concept of quasimeasures and an enumerable semimeasure PU as a mixture over 
all enumerable quasimeasures. Proposition 1121 shows that W M.L.-converges to D. 
Theorem |S1 immediately follows from Propositions ^2 and IT^ 

Lemma 9 (Hellinger Chain) Let h(p,q) ■. = J2f=ii\^— y/OiY be the Hellinger dis- 
tance between p={j)i)'jLiElR^ and q = {qi)fLiE IR^ . Then 
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i) forp,q,re]R^ h{p,q) < {I + (3) h{p,r) + {I + l3~^) h{r,q), any [3 > Q 

m 
fc=2 

Proof, (i) For any x,y G IR and > we have {x + y^ < {l + (3)x'^ + {l + (3^^)y'^. 
Inserting x = y/pi—y/rl and y = ^fT'i — ^fqi and summing over % proves ii). 

{ii) Apply (i) for the triples {p^ ^p^^^ ^p^) for and in order of A; = l,2,...,m— 2 with 
j] = l]^ = k{k + l) and finally use Y{)zl{l+ Pj^) <e<?>. □ 

We need a way to convert expected bounds to bounds on individual M.L. random 
sequences, sort of a converse of "M.L. implies w.p.l". Consider for instance the 
Hellinger sum H{uj) ■.= J2'^iht{fJ',p) between two computable measures p> 
w-fi. Then H is an enumerable function and Lemma 0] implies E[if] < 1, hence 
H is an integral /i-test. H can be increased to an enumerable /i-submartingale 
H. The universal /i-submartingale M/ /i multiplicatively dominates all enumerable 
submartingales (and hence H). Since M//i< 2'^^^'^^ this implies the desired bound 
H{u)^ 2'^f'('^) for individual u. We give a self-contained direct proof, explicating all 
important constants. 

Lemma 10 (Expected to Individual Bound) Let F{uj) > be an enumerable 
function and fi be an enumerable measure andoO be co- enumerable. Then: 

If E^[F] < e then F{uj) < 5.2^('^'^' V-)+'^.(-) Va; 

where d^iu) is the ^-randomness deficiency of uj and K{fi,F, ^/e) is the length of the 
shortest program for fi, F, and ^/e. 

Lemma ^1 roughly says that for /i, F, and e = E^[F] with short program 
(ir(/i,F,Ve)=0(l)) and /i-random u (rf^(cj) = 0(1)) we have F{uj) < E^[F]. 

Proof. Let F{u) = lim„^oo-^n(i^) = sup„F„(ct;) be enumerated by an increasing 
sequence of computable functions Fn{uj). Fniu) can be chosen to depend on oji.n 
only, i.e. F„(ci;) = F„(c<Ji.„) is independent of ujn+v.oo- Let co-enumerate e. We 

define 

Pnii^i-.k) ■= P{^i.n)Fn{uJi;n) for k < u, and fin{uJi:k) = for k> n. 

'-'fc + l:neA'"-fc 

computable semimeasure for each n (due to E^[F„] <e) and increasing in n, 

since 

fini^^i-.k) > = /U„_i(u;i:fe) for k>n and 

fini^Kn) > £n^K^l:n)Fn-l{uJ<n) = V(^<n)-^n-l (t^<n) > fin-l{^<n) 

Pn > Fn-1 n measure < £»-! 
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and similarly for k<n — l. Hence /i:=/Xoo is an enumerable semimeasure (indeed n 
is proportional to a measure). From dominance (j2I) we get 

In order to enumerate ft, we need to enumerate fi, F, and e"^, hence 
K{fL) ^ K{fj.,F, Ye), so we get 

Taking the limit Fn/^ F and e„\ e completes the proof. □ 

Let M. = {1/1, 1/2,...} be an enumeration of all enumerable semimeasures, Jfe:={«< 
k : Ui is measure}, and Sk{x) '■ = Y.i<^j^^i^i{.x). The weights Si need to be computable 
and exponentially decreasing in i and Y^Li^i^'^- We choose ej = i~^2~\ Note the 
subtle and important fact that although the definition of is non-constructive, as 
a finite set of finite objects, is decidable (the program is unknowable for large k). 
Hence, 5k is computable, since enumerable measures are computable. 

D{x) = Soo{x) = ^ SiUi^x) = mixture of all computable measures. 

In contrast to Jk and 6k, the set Joo and hence D are neither enumerable nor 
co-enumerable. We also define the measures 6k{x) := Sk{x)/6k{e) and D{x) := 
D{x)/ D{e). The following Proposition implies posterior convergence of D to 
on /i-random sequences. 

Proposition 11 (Convergence of incomputable measure D) Let fi be a com- 
putable measure with index ko, i.e. ^ = Vko. Then for the incomputable measure D 
and the computable but non- constructive measures defined above, the following 
holds: 

i) i:T=iht{Kvt^) ^ 21n2-d^(a;) + 3fco 
^^) J:r=ihtCK,D) ^ kl2>^o+d,i.) 

Combining [i] and {ii), using Lemma IHfi), we get J^t^ihtifJ', D) <Ci^f{ko) < oo 
for //-random u, which implies i5(6|ci;<t) = £)(6|c<j<t) — /i(6|a;<t). We do not know 
whether on-sequence convergence of the ratio holds. Similar bounds hold for 6ki 
instead 6ko, ki > k^. The principle proof idea is to convert the expected bounds 
of Lemma |3] to individual bounds, using Lemma ^1 The problem is that D is not 
computable, which we circumvent by joining with LemmalHl bounds on J2tht{Sk-i,Sk) 
for k = kQ,ko + l,.... 

Proof, (z) Let H{lj) ■=J2'^iht{^ko,t^)- A* and 6ko are measures with > 5a:o ^ 
e^o/i, since Sk{e) < 1, fi = Vko and k^ G Jko- Hence, Lemma El applies and shows 
E^[exp(ii7)] <ekl^'^ ■ H is well-defined and enumerable for d^j,{uj) < oo, since d^[ijj) < 
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oo implies ^.{uJi-.t) 7^ implies 5kQ{uJi:t) 7^ 0. So ^,{h\uji,t) and 6ka{h\uJi:t) are well 
defined and computable (given J^,,). Hence ht{6ko,fi) is computable, hence H{uj) 
is enumerable. Lemma ITUl then implies exp{^H{uj)) ^^Sj.^J'^ ■2^^^'^''^''o^~^'^^^^\ We 
bound 

ir(/i,i/,v^,J ^ K{H\^i,ko) + K{h) ^ ir(JfeJfco)+K(A;o) ^ A;o + 21ogfco. 

The first inequality holds, since ko is the index and hence a description of /i, and 
is a simple computable function. H can be computed from /i, /cq and Jjto, which 
implies the second inequality. The last inequality follows from K{ko) ^ 21ogA;o and 
the fact that for each i<kQ one bit suffices to specify (non)membership to J^p, i.e. 
-ft'(Jfco|A;o) ^ ko. Putting everything together we get 

H{tu) ^ Insf^^^ + [ko + 2logko + d^{uj)]2\n2 ^ (2 ln2)d^(cu) + 3A;o. 
(ii) Let H''(uj) : = J2'^iht{5k,5k-i) and k>kQ. 5k~i<Sk implies 

Sk-i{x) ^ 4(e) ^ 4-i(e) + gfc _ X _^ gfc < 1 _^ £i 
~ 4-i(e) ~ 4-i(e) 4-i(e) ~ £0' 

where O : = min{z G Jfc-i} = 0(1). Note that Jfc_i3/co is not empty. Since 4-i and 
4 are measures. Lemma 0] applies and shows E^^ J//'^] < ln(l + |^) < Exploiting 

£kofJ' < 4-1, this implies Ef,[H^] < j^f^- Lemma HU] then implies H''{uj) ^ j^f^' 

2K(fM,H'',EoekQ/ek)+d^i'^) _ Similarly as in (i) we can bound 



K{ii,H^,ekj£o£k) < K{Jk\k) + K{k) + K{ko) <k + 2logk + 21ogA;o, hence 

H^{uo) < ^^■klk'^2''c^ = kl2''^k-^c^, where := 2'^^^^\ 
Chaining this bound via Lemma IHl^ii) we get for ki>kQ: 

n n ki 

E^t(4o,4J < E (^-^o+i)'/it(4-i,4) 

t=l t=l k=ko+l 

k\ k\ 

< 3 E k''H^{uj) < 3A:^2'=V^ ^ '^"^ ^ ^kl2'"'c^ 

k=ko+l k=ko+l 

If we now take fci^oo we get Et=iht0ko,D) ^ 3A;j2'=o+'^^('^). Finally let n^oo. □ 

The main properties allowing for proving n were that D is a measure with 
approximations 6k, which are computable in a certain sense. Z) is a mixture over all 
enumerable/computable measures and hence incomputable. 
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6 M.L.- Converging Enumerable Semimeasure W 

The next step is to enlarge the class of computable measures to an enumerable 
class of semimeasures, which are still sufficiently close to measures in order not to 
spoil the convergence result. For convergence w.p.l. we could include all semimea- 
sures (Theorem 12)). M.L. -convergence seems to require a more restricted class. In- 
cluded non-measures need to be zero on long strings. We convert semimeasures u 
to "quasimeasures" u as follows: 

i>ixi;n) ■■= uixi;n) if ^{y^-n) > 1 - " and v{xi;n) := else. 

If the condition is violated for some n it is also violated for all larger n, hence with 
V also i> is a semimeasure. v is enumerable if v is enumerable. So if i/i,z/2,... is an 
enumeration of all enumerable semimeasures, then z/i,i>2,... is an enumeration of all 
enumerable quasimeasures. The for us important properties are that Ui < z/j -and- 
if z/j is a measure, then z/j = z/j, else z/j(x) = for sufficiently long x. We define the 
enumerable semimeasure 

oo 

:= ^eji>j(x), andnote that D{x) ='YeiUi{x) with J := {i : i>j is measure} 
i=l ieJ 

with ei = i~^2~^ as before. 



Proposition 12 (Convergence of enumerable W to incomputable D) For 

every computable measure fi and for uj being ^-random, the following holds for 

D[uJl:t) D{UJt\uJ<t) 

The intuitive reason for the convergence is that the additional contributions of 
non-measures to W absent in D are zero for long sequences. 

Proof, ii) 

oo 

D{x) < W{x) = D{x) + Y.eii)i{x) < Dix) + sMx), (8) 

where /c^:=minj{i^ J:z/j(x)7^0}. For i^J, Vi is not a measure. Hence z/j(x)=0 for 
sufficiently long x. This implies kx^oo for i{x)^oo, hence W{x) D{x) Va;. To 
get convergence in ratio we have to assume that x = uji-n with uj being yU-random, 
I.e. c^. sup„^^^^^^-z <^oo. 

Hx) < Jy^{x) < —M{x) < —fi{x) < -^D{x), 
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The last inequality holds, since /x is a computable measure of index ko, i.e. iJ. = h'ko = 
Ukn- Inserting l/wu^<c' -P for some c = 0(l) and Ei we get EiUi^x) <^-^i~^2~^D{x), 

which implies T.'^k^^i^ii^) < ^'xD{x) with e'^ := '^£L£:i^k~'^2~^'^ for oo. In- 

serting this into (jH)) we get 

1 < — -— < 1 + — > 1 for u-random x. 

D{x) 

{a) Obvious from (z) by taking a double ratio. 

(m) Let aeX. From W{xa)>D{xa) iW>D) and W{x) <{l+e'^)D{x) (i) we 

get 

Vr(a|x) > {1 + e'J''^D{a\x) > {1 - e'^)D{a\x) WaeX, and 
l-W{a\x) > Y.W{b\x) > (l-4)^D(6|x) = {l-e',){l-D{a\x)), 

where we used in the second line that is a semimeasure and D proportional to a 
measure. Together this implies \W{a\x)—D{a\x)\<e'^. Since >-0 for /x-random 
X, this shows (iii). hx{W,D) <e'^ can also be shown. □ 

Speed of convergence. The main convergence Theorem|Hlnow immediately follows 
from Propositions ^2 and ^1 We briefly remark on the convergence rate. Lemma 0] 
shows that E[J2tht{X,fi)] is logarithmic in the index ko of fi for X = M (lnw^y^=lnA;o), 

but linear for X =[W,D,5ko] {^^Sko = ko)- The individual bounds for J2tht{Skojf^) and 
Ht^ti^koiD) in Proposition ITTl are linear and exponential in k^, respectively. For 

we could not establish any convergence speed. 
Finally we show that W does not dominate all enumerable semimeasures, as the 
definition of W suggests. We summarize all computability, measure, and dominance 
properties of M, D, D, and W in the following theorem: 

Theorem 13 (Properties of M, W, D, and D) 

(i) M is an enumerable semimeasure, which dominates all enumerable semimea- 
sures. M is not computable and not a measure. 

(ii) D is a measure, D is proportional to a measure, both dominating all enumerable 
quasimeasures . D and D are not computable and do not dominate all enumerable 
semimeasures. 

{Hi) W is an enumerable semimeasure, which dominates all enumerable quasimea- 
sures. W is not itself a quasimeasure, is not computable, and does not dominate all 
enumerable semimeasures. 

We conjecture that D and D are not even approximable (limit-computable), but 
lie somewhere higher in the arithmetic hierarchy. Since W can be normalized to 
an approximable measure M.L.-converging to /i, and D was only an intermediate 
quantity, the question of approximability of D seems not too interesting. 



On the Convergence of Universal Semimeasures 



15 



7 Conclusions 

We investigated a natural strengthening of Solomonoff 's famous convergence theo- 
rem, the latter stating that with probability 1 (w.p.l) the posterior of a universal 

semimeasure M converges to the true computable distribution /i (M^^^/i). We an- 
swered partially negative the question of whether convergence also holds individually 
for all Martin-Lof (M.L.) random sequences (3M -.M^-fi). We constructed ran- 
dom sequences a for which there exist universal semimeasures on which convergence 
fails. Multiplicative dominance of M is the key property to show convergence w.p.l. 
Dominance over all measures is also satisfied by the restricted mixture W over all 
quasimeasures. We showed that W converges to fi on all M.L.-random sequences by 
exploiting the incomputable mixture D over all measures. For D ^^—i- we achieved 
a (weak) convergence rate; for W^—>D and only an asymptotic result. 

The convergence rate properties w.p.l. of D and W are as excellent as for M. 

We do not know whether holds. We also don't know the convergence 

rate for W^—i-D, and the current bound for D^—^jj, is double exponentially worse 
than for M /x. A minor question is whether D is approximable (which is un- 
likely). Finally there could still exist universal semimeasures M (dominating all 
enumerable semimeasures) for which M.L. -convergence holds {3M:M'^fi'!). In 
case they exist, we expect them to have particularly interesting additional structure 
and properties. While most results in algorithmic information theory are indepen- 
dent of the choice of the underlying universal Turing machine (UTM) or universal 
semimeasure (USM), there are also results which depend on this choice. For in- 
stance, one can show that {{x,n) -.Kui^x) <n] is tt-complete for some f/, but not 
tt-complete for others |MP02j . A potential U dependence also occurs for predic- 
tions based on monotone complexity |Hut03dj . It could lead to interesting insights 
to identify a class of "natural" UTMs/USMs which have a variety of favorable prop- 
erties. A more moderate approach may be to consider classes Ci of UTMs/USMs 
satisfying certain properties Vi and showing that the intersection fljCj is not empty. 

Another interesting and potentially fruitful approach to the convergence problem 
at hand is to consider other classes of semimeasures M., define mixtures M over A^, 
and (possibly) generalized randomness concepts by using this M in Definition El 
Using this approach, in jHut03bj it has been shown that convergence holds for a 
subclass of Bernoulli distributions if the class is dense, but fails if the class is gappy, 
showing that a denseness characterization of M. could be promising in general. 

Acknowledgements. We want to thank Alexey Chernov for his invaluable help. 
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